1.4 Data Source
🔎 1.4.3 Pushshift - Reddit API
The Pushshift Reddit API, offers expansive access to Reddit’s historical data, bypassing the latter’s limitations on data recency and query volume. Pushshift’s API features include queries for submissions, comments, and subreddits, with data housed in its own database that’s regularly refreshed with new content from Reddit. This makes it a potent tool for researchers and developers interested in analyzing Reddit’s extensive data over long periods. For practical application, using Python with Pushshift to access Reddit data simplifies data extraction, enabling specific queries such as searching comments or submissions, filtering by subreddit, or excluding certain authors. This approach enhances research and analysis by allowing for detailed data retrieval, including the most active subreddits for a given search term or the top upvoted comments. Professor Amit Arora amassed all Reddit submissions and comments spanning from January 2022 through January 2023, and archived the data in an S3 bucket.
đź’ą 1.4.4 CryptoCompare - External Dataset
To augment our analysis of Reddit conversations, we utilized the Cryptocompare API for retrieving historical hourly price data of Dogecoin and Bitcoin for the entirety of 2022. This approach allowed us to use these cryptocurrencies as indicators for the wider crypto market’s trends over the year. Given the API’s restriction of 2000 records per request, we initiated six separate queries to ensure we captured the full annual span comprehensively. The culmination of these efforts yielded a dataset consisting of 8762 entries for each cryptocurrency, offering an in-depth perspective on Dogecoin’s price fluctuations in comparison to the broader market dynamics. This meticulous collection of data underscores the intricate relationship between social media discourse and market performance within the cryptocurrency sector.